我们正在使用使用Kinect V2传感器收集的美国手语(ASL)的数据集,该数据集包含包含Fluent和非浮力签名者的视频。该数据集是作为一个项目的一部分收集的,该项目旨在开发和评估计算机视觉算法,以支持新技术以自动检测ASL流利度属性。总共要求45名流利和非全体参与者执行与介绍性或中级ASL课程中使用的作业相似的签名作业作业。注释数据以确定签名的几个方面,包括语法特征和非手动标记。手语识别目前非常数据驱动,该数据集可以支持识别技术的设计,尤其是可以使ASL学习者受益的技术。对于想要对比流利和非流利签名的ASL教育研究人员来说,该数据集也可能很有趣。
translated by 谷歌翻译
常规的自我监督单眼深度预测方法基于静态环境假设,这导致由于对象运动引入的不匹配和遮挡问题而导致动态场景的准确性降解。现有的以动态对象为中心的方法仅部分解决了训练损失级别的不匹配问题。在本文中,我们因此提出了一种新型的多帧单眼预测方法,以在预测和监督损失水平上解决这些问题。我们的方法称为DynamicDepth,是一个新框架,该框架是通过自我监督周期一致的学习方案训练的。提出了动态对象运动解开(DOMD)模块以解开对象运动以解决不匹配问题。此外,新颖的闭塞成本量和重新投射损失旨在减轻对象运动的闭塞作用。对CityScapes和Kitti数据集进行的广泛分析和实验表明,我们的方法显着优于最先进的单眼深度预测方法,尤其是在动态对象的领域。代码可从https://github.com/autoailab/dynamicdepth获得
translated by 谷歌翻译
由于其前所未有的优势,在规模,移动,部署和隐蔽观察能力方面,空中平台和成像传感器的快速出现是实现新的空中监测形式。本文从计算机视觉和模式识别的角度来看,全面概述了以人为本的空中监控任务。它旨在为读者提供使用无人机,无人机和其他空中平台的空中监测任务当前状态的深入系统审查和技术分析。感兴趣的主要对象是人类,其中要检测单个或多个受试者,识别,跟踪,重新识别并进行其行为。更具体地,对于这四项任务中的每一个,我们首先讨论与基于地面的设置相比在空中环境中执行这些任务的独特挑战。然后,我们审查和分析公共可用于每项任务的航空数据集,并深入了解航空文学中的方法,并调查他们目前如何应对鸟瞰挑战。我们在讨论缺失差距和开放研究问题的讨论中得出结论,告知未来的研究途径。
translated by 谷歌翻译
自我监督的单眼深度预测提供了一种经济有效的解决方案,以获得每个像素的3D位置。然而,现有方法通常会导致不满意的准确性,这对于自治机器人至关重要。在本文中,我们提出了一种新的两级网络,通过利用低成本稀疏(例如4梁)LIDAR来推进自我监督单眼密集深度学习。与使用稀疏激光雷达的现有方法不同,主要以耗时的迭代后处理,我们的模型保留单眼图像特征和稀疏的LIDAR功能,以预测初始深度图。然后,有效的前馈细化网络进一步设计为校正伪3D空间中这些初始深度图中的错误,其具有实时性能。广泛的实验表明,我们所提出的模型显着优于所有最先进的自我监控方法,以及基于稀疏的激光器的方法,以及对自我监督单眼深度预测和完成任务。通过精确的密集深度预测,我们的模型优于基于最先进的稀疏激光雷达的方法(伪LIDAR ++)在Kitti排行榜上下游任务单眼3D对象检测超过68%。代码可在https://github.com/autoailab/fusiondepth获得
translated by 谷歌翻译
场景流程描绘了3D场景的动态,这对于传统上,从诸如自主驾驶,机器人导航,AR / VR等的各种应用来说至关重要。从密集/常规RGB视频帧估计场景流。随着深度感测技术的发展,通过点云可获得精确的3D测量,这在3D场景流中引发了新的研究。然而,由于典型点云采样模式中的稀缺性和不规则性,从点云中提取场景流量仍然具有挑战性。与不规则采样相关的一个主要问题被识别为点设置抽象/特征提取期间的随机性 - 许多流程估计场景中的基本进程。因此,提出了一种注意力(SA ^ 2)层的新型空间抽象,以减轻不稳定的抽象问题。此外,提出了一种注意力(TA ^ 2)层的时间抽象来纠正时间域中的注意力,导致运动中的运动缩放在更大范围内。广泛的分析和实验验证了我们方法的动机和显着性能收益,与空间 - 时间注意(Festa)称为流量估计,与场景流估计的几个最先进的基准相比。
translated by 谷歌翻译
Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used image and video datasets and the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Humans have internal models of robots (like their physical capabilities), the world (like what will happen next), and their tasks (like a preferred goal). However, human internal models are not always perfect: for example, it is easy to underestimate a robot's inertia. Nevertheless, these models change and improve over time as humans gather more experience. Interestingly, robot actions influence what this experience is, and therefore influence how people's internal models change. In this work we take a step towards enabling robots to understand the influence they have, leverage it to better assist people, and help human models more quickly align with reality. Our key idea is to model the human's learning as a nonlinear dynamical system which evolves the human's internal model given new observations. We formulate a novel optimization problem to infer the human's learning dynamics from demonstrations that naturally exhibit human learning. We then formalize how robots can influence human learning by embedding the human's learning dynamics model into the robot planning problem. Although our formulations provide concrete problem statements, they are intractable to solve in full generality. We contribute an approximation that sacrifices the complexity of the human internal models we can represent, but enables robots to learn the nonlinear dynamics of these internal models. We evaluate our inference and planning methods in a suite of simulated environments and an in-person user study, where a 7DOF robotic arm teaches participants to be better teleoperators. While influencing human learning remains an open problem, our results demonstrate that this influence is possible and can be helpful in real human-robot interaction.
translated by 谷歌翻译
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.
translated by 谷歌翻译
We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.
translated by 谷歌翻译